Overview

Dataset statistics

Number of variables12
Number of observations696
Missing cells0
Missing cells (%)0.0%
Duplicate rows4
Duplicate rows (%)0.6%
Total size in memory65.4 KiB
Average record size in memory96.2 B

Variable types

Categorical5
Numeric7

Alerts

Dataset has 4 (0.6%) duplicate rowsDuplicates
profile pic is highly correlated with #posts and 2 other fieldsHigh correlation
nums/length username is highly correlated with fakeHigh correlation
description length is highly correlated with #posts and 2 other fieldsHigh correlation
#posts is highly correlated with profile pic and 4 other fieldsHigh correlation
#followers is highly correlated with profile pic and 4 other fieldsHigh correlation
#follows is highly correlated with #posts and 1 other fieldsHigh correlation
fake is highly correlated with profile pic and 4 other fieldsHigh correlation
profile pic is highly correlated with fakeHigh correlation
nums/length username is highly correlated with fakeHigh correlation
fake is highly correlated with profile pic and 1 other fieldsHigh correlation
profile pic is highly correlated with #posts and 1 other fieldsHigh correlation
nums/length username is highly correlated with fakeHigh correlation
description length is highly correlated with fakeHigh correlation
#posts is highly correlated with profile pic and 2 other fieldsHigh correlation
#followers is highly correlated with #posts and 2 other fieldsHigh correlation
#follows is highly correlated with #followersHigh correlation
fake is highly correlated with profile pic and 4 other fieldsHigh correlation
fake is highly correlated with profile picHigh correlation
profile pic is highly correlated with fakeHigh correlation
profile pic is highly correlated with description length and 1 other fieldsHigh correlation
nums/length username is highly correlated with nums/length fullname and 1 other fieldsHigh correlation
nums/length fullname is highly correlated with nums/length username and 1 other fieldsHigh correlation
name==username is highly correlated with nums/length fullnameHigh correlation
description length is highly correlated with profile pic and 2 other fieldsHigh correlation
external URL is highly correlated with description length and 1 other fieldsHigh correlation
#posts is highly correlated with #followers and 1 other fieldsHigh correlation
#followers is highly correlated with #posts and 1 other fieldsHigh correlation
#follows is highly correlated with #posts and 1 other fieldsHigh correlation
fake is highly correlated with profile pic and 3 other fieldsHigh correlation
fake is uniformly distributed Uniform
nums/length username has 363 (52.2%) zeros Zeros
fullname words has 64 (9.2%) zeros Zeros
nums/length fullname has 622 (89.4%) zeros Zeros
description length has 395 (56.8%) zeros Zeros
#posts has 185 (26.6%) zeros Zeros
#followers has 20 (2.9%) zeros Zeros
#follows has 11 (1.6%) zeros Zeros

Reproduction

Analysis started2022-07-04 06:06:52.746499
Analysis finished2022-07-04 06:07:01.263254
Duration8.52 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

profile pic
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
1
495 
0
201 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1495
71.1%
0201
28.9%

Length

2022-07-04T11:37:01.360992image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-07-04T11:37:01.435765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1495
71.1%
0201
28.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

nums/length username
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct58
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1666091954
Minimum0
Maximum0.92
Zeros363
Zeros (%)52.2%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2022-07-04T11:37:01.520353image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.33
95-th percentile0.57
Maximum0.92
Range0.92
Interquartile range (IQR)0.33

Descriptive statistics

Standard deviation0.2189635419
Coefficient of variation (CV)1.314234435
Kurtosis1.018794568
Mean0.1666091954
Median Absolute Deviation (MAD)0
Skewness1.262005362
Sum115.96
Variance0.04794503266
MonotonicityNot monotonic
2022-07-04T11:37:01.647208image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0363
52.2%
0.3331
 
4.5%
0.518
 
2.6%
0.4418
 
2.6%
0.3817
 
2.4%
0.2515
 
2.2%
0.2212
 
1.7%
0.5711
 
1.6%
0.4311
 
1.6%
0.1411
 
1.6%
Other values (48)189
27.2%
ValueCountFrequency (%)
0363
52.2%
0.051
 
0.1%
0.062
 
0.3%
0.075
 
0.7%
0.084
 
0.6%
0.095
 
0.7%
0.19
 
1.3%
0.113
 
0.4%
0.128
 
1.1%
0.133
 
0.4%
ValueCountFrequency (%)
0.921
 
0.1%
0.912
 
0.3%
0.895
0.7%
0.885
0.7%
0.861
 
0.1%
0.831
 
0.1%
0.81
 
0.1%
0.751
 
0.1%
0.731
 
0.1%
0.711
 
0.1%

fullname words
Real number (ℝ≥0)

ZEROS

Distinct11
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.475574713
Minimum0
Maximum12
Zeros64
Zeros (%)9.2%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2022-07-04T11:37:01.748072image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile3
Maximum12
Range12
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.076621929
Coefficient of variation (CV)0.7296288825
Kurtosis23.2558394
Mean1.475574713
Median Absolute Deviation (MAD)1
Skewness3.331094173
Sum1027
Variance1.159114777
MonotonicityNot monotonic
2022-07-04T11:37:01.842752image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
1347
49.9%
2227
32.6%
064
 
9.2%
338
 
5.5%
48
 
1.1%
56
 
0.9%
62
 
0.3%
121
 
0.1%
101
 
0.1%
91
 
0.1%
ValueCountFrequency (%)
064
 
9.2%
1347
49.9%
2227
32.6%
338
 
5.5%
48
 
1.1%
56
 
0.9%
62
 
0.3%
71
 
0.1%
91
 
0.1%
101
 
0.1%
ValueCountFrequency (%)
121
 
0.1%
101
 
0.1%
91
 
0.1%
71
 
0.1%
62
 
0.3%
56
 
0.9%
48
 
1.1%
338
 
5.5%
2227
32.6%
1347
49.9%

nums/length fullname
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct27
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04216954023
Minimum0
Maximum1
Zeros622
Zeros (%)89.4%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2022-07-04T11:37:01.942447image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.33
Maximum1
Range1
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1436643971
Coefficient of variation (CV)3.406828632
Kurtosis21.64092909
Mean0.04216954023
Median Absolute Deviation (MAD)0
Skewness4.324611964
Sum29.35
Variance0.02063945898
MonotonicityNot monotonic
2022-07-04T11:37:02.046925image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
0622
89.4%
0.3313
 
1.9%
0.411
 
1.6%
17
 
1.0%
0.256
 
0.9%
0.54
 
0.6%
0.243
 
0.4%
0.313
 
0.4%
0.443
 
0.4%
0.223
 
0.4%
Other values (17)21
 
3.0%
ValueCountFrequency (%)
0622
89.4%
0.081
 
0.1%
0.11
 
0.1%
0.111
 
0.1%
0.122
 
0.3%
0.141
 
0.1%
0.182
 
0.3%
0.21
 
0.1%
0.223
 
0.4%
0.243
 
0.4%
ValueCountFrequency (%)
17
1.0%
0.891
 
0.1%
0.571
 
0.1%
0.561
 
0.1%
0.54
 
0.6%
0.461
 
0.1%
0.443
 
0.4%
0.432
 
0.3%
0.411
1.6%
0.381
 
0.1%

name==username
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
0
671 
1
 
25

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0671
96.4%
125
 
3.6%

Length

2022-07-04T11:37:02.149097image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-07-04T11:37:02.208081image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0671
96.4%
125
 
3.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

description length
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct114
Distinct (%)16.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.41235632
Minimum0
Maximum150
Zeros395
Zeros (%)56.8%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2022-07-04T11:37:02.283579image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q335
95-th percentile123
Maximum150
Range150
Interquartile range (IQR)35

Descriptive statistics

Standard deviation38.59572149
Coefficient of variation (CV)1.64851931
Kurtosis2.371692232
Mean23.41235632
Median Absolute Deviation (MAD)0
Skewness1.802317793
Sum16295
Variance1489.629718
MonotonicityNot monotonic
2022-07-04T11:37:02.493724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0395
56.8%
358
 
1.1%
27
 
1.0%
187
 
1.0%
197
 
1.0%
436
 
0.9%
506
 
0.9%
306
 
0.9%
266
 
0.9%
246
 
0.9%
Other values (104)242
34.8%
ValueCountFrequency (%)
0395
56.8%
14
 
0.6%
27
 
1.0%
32
 
0.3%
44
 
0.6%
55
 
0.7%
63
 
0.4%
71
 
0.1%
83
 
0.4%
94
 
0.6%
ValueCountFrequency (%)
1502
0.3%
1494
0.6%
1484
0.6%
1471
 
0.1%
1463
0.4%
1431
 
0.1%
1401
 
0.1%
1391
 
0.1%
1383
0.4%
1372
0.3%

external URL
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
0
617 
1
79 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0617
88.6%
179
 
11.4%

Length

2022-07-04T11:37:02.612451image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-07-04T11:37:02.669719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0617
88.6%
179
 
11.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

private
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
0
439 
1
257 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0439
63.1%
1257
36.9%

Length

2022-07-04T11:37:02.736542image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-07-04T11:37:02.795074image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0439
63.1%
1257
36.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

#posts
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct211
Distinct (%)30.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean103.2442529
Minimum0
Maximum7389
Zeros185
Zeros (%)26.6%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2022-07-04T11:37:02.870082image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median9
Q377
95-th percentile463.5
Maximum7389
Range7389
Interquartile range (IQR)77

Descriptive statistics

Standard deviation378.0281678
Coefficient of variation (CV)3.661493568
Kurtosis225.4104391
Mean103.2442529
Median Absolute Deviation (MAD)9
Skewness13.14072017
Sum71858
Variance142905.2957
MonotonicityNot monotonic
2022-07-04T11:37:02.993081image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0185
26.6%
136
 
5.2%
227
 
3.9%
321
 
3.0%
417
 
2.4%
517
 
2.4%
816
 
2.3%
914
 
2.0%
613
 
1.9%
710
 
1.4%
Other values (201)340
48.9%
ValueCountFrequency (%)
0185
26.6%
136
 
5.2%
227
 
3.9%
321
 
3.0%
417
 
2.4%
517
 
2.4%
613
 
1.9%
710
 
1.4%
816
 
2.3%
914
 
2.0%
ValueCountFrequency (%)
73891
0.1%
44941
0.1%
18791
0.1%
15701
0.1%
12321
0.1%
11641
0.1%
11591
0.1%
10651
0.1%
10201
0.1%
9901
0.1%

#followers
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct435
Distinct (%)62.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79149.90517
Minimum0
Maximum15338538
Zeros20
Zeros (%)2.9%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2022-07-04T11:37:03.121858image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4
Q142
median165.5
Q3693
95-th percentile13264.25
Maximum15338538
Range15338538
Interquartile range (IQR)651

Descriptive statistics

Standard deviation842887.5382
Coefficient of variation (CV)10.64925519
Kurtosis228.2298983
Mean79149.90517
Median Absolute Deviation (MAD)150.5
Skewness14.41101535
Sum55088334
Variance7.10459402 × 1011
MonotonicityNot monotonic
2022-07-04T11:37:03.249641image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020
 
2.9%
4910
 
1.4%
169
 
1.3%
158
 
1.1%
217
 
1.0%
17
 
1.0%
57
 
1.0%
397
 
1.0%
426
 
0.9%
776
 
0.9%
Other values (425)609
87.5%
ValueCountFrequency (%)
020
2.9%
17
 
1.0%
25
 
0.7%
32
 
0.3%
43
 
0.4%
57
 
1.0%
63
 
0.4%
73
 
0.4%
83
 
0.4%
94
 
0.6%
ValueCountFrequency (%)
153385381
0.1%
123977191
0.1%
67413071
0.1%
53156511
0.1%
40218421
0.1%
38964901
0.1%
10274191
0.1%
8909691
0.1%
7320751
0.1%
6699871
0.1%

#follows
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct468
Distinct (%)67.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean555.0862069
Minimum0
Maximum7500
Zeros11
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2022-07-04T11:37:03.382708image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6
Q161
median252
Q3601.75
95-th percentile2067.5
Maximum7500
Range7500
Interquartile range (IQR)540.75

Descriptive statistics

Standard deviation1023.613869
Coefficient of variation (CV)1.84406288
Kurtosis23.08467126
Mean555.0862069
Median Absolute Deviation (MAD)216.5
Skewness4.407066297
Sum386340
Variance1047785.354
MonotonicityNot monotonic
2022-07-04T11:37:03.502783image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
011
 
1.6%
227
 
1.0%
17
 
1.0%
27
 
1.0%
646
 
0.9%
766
 
0.9%
266
 
0.9%
6946
 
0.9%
1515
 
0.7%
3335
 
0.7%
Other values (458)630
90.5%
ValueCountFrequency (%)
011
1.6%
17
1.0%
27
1.0%
32
 
0.3%
44
 
0.6%
52
 
0.3%
64
 
0.6%
72
 
0.3%
85
0.7%
92
 
0.3%
ValueCountFrequency (%)
75002
0.3%
74531
0.1%
73991
0.1%
73691
0.1%
72721
0.1%
72021
0.1%
61721
0.1%
61531
0.1%
55141
0.1%
46641
0.1%

fake
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
0
348 
1
348 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0348
50.0%
1348
50.0%

Length

2022-07-04T11:37:03.617533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-07-04T11:37:03.676903image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1348
50.0%
0348
50.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2022-07-04T11:37:00.013242image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:54.833492image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.631955image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:56.575455image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.420184image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.253944image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:59.076745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:37:00.124942image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:54.953767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.740691image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:56.682173image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.532872image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.365645image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:59.190441image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:37:00.239635image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.063504image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.863335image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:56.793878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.651577image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.482361image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:59.311132image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:37:00.357634image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.173180image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.979025image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:56.905550image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.766275image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.596056image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:59.428803image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:37:00.476338image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.286876image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:56.104710image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.028221image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.886952image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.715740image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:59.552495image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:37:00.595026image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.399602image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:56.228380image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.181810image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.008600image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.832398image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:59.674170image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:37:00.721660image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:55.518287image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:56.458743image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:57.303486image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.132292image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:58.957065image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-07-04T11:36:59.889572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-07-04T11:37:03.744846image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-04T11:37:03.928691image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-04T11:37:04.113720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-04T11:37:04.287829image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-04T11:37:04.421693image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-04T11:37:00.934092image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-04T11:37:01.155914image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

profile picnums/length usernamefullname wordsnums/length fullnamename==usernamedescription lengthexternal URLprivate#posts#followers#followsfake
010.2700.0053003210009550
110.0020.00440028627405330
210.1020.0000113159980
310.0010.0082006794146510
410.0020.0000161511260
510.0040.0081103446699871500
610.0020.005000161221770
710.0020.00000331078760
810.0000.00710072182427130
910.0020.004010213129458130

Last rows

profile picnums/length usernamefullname wordsnums/length fullnamename==usernamedescription lengthexternal URLprivate#posts#followers#followsfake
68610.0010.00000011936691
68710.0010.2500000492351
68800.4410.00000001371
68910.4510.4000002742701
69010.3310.000000888761
69110.2910.000000131148111
69210.4010.00000041501641
69310.0020.000000383335721
69400.1710.000000121916951
69510.4410.000000339681

Duplicate rows

Most frequently occurring

profile picnums/length usernamefullname wordsnums/length fullnamename==usernamedescription lengthexternal URLprivate#posts#followers#followsfake# duplicates
000.0010.0100006969412
110.0020.004800222528265202
210.2710.000000456412
310.9110.000000752612